1. Technical Field
The present invention relates to multimedia processing and image processing, and in particular to a technique of extracting contents having a similarity from among a plurality of contents and merging them.
2. Background Art
With the popularity of personal digital devices (e.g. digital cameras, digital video cameras), the number of personally recorded digital videos is increasing explosively. This is because of the following reasons, for example: (1) storage devices available for storing recorded videos (i.e. video clips) have increased in capacity; (2) recorded videos only contain a single shot, and their recording durations (i.e. playback durations) are very short; and (3) such videos are related to various subjects or events. Users often need to maintain their own video collections captured at different locations and time. However, when there are a very large number of videos, it is difficult for the users to manage and manipulate their videos. For example, it is not easy for a user to browse desired videos from among a large number of videos captured by the user.
Video summarization is a conventional technique for realizing efficient browsing of such a large amount of videos. According to the video summarization, however, there is a risk of missing details when features to be used for summarization are irrelevant to the story of the video. Moreover, it could be impossible to summarize the video when features to be used for summarization are contained in almost all of the frames of the video. In such cases, summarization could be inaccurate.
Besides the technique discussed above, there have been other techniques of composting (i.e. merging) videos (See Non-Patent Literature 1 through 3). For example, Non-Patent Literature 1 discloses a technique of composing a coherent video automatically if there are appropriate domain-specific metadata associated with video segments. The system disclosed in Non-Patent Literature 2 automatically selects home video segments and aligns them with music to create an edited video segment.
Here, the term “shot” used in the present Description means the most basic physical entity in a video, and refers to an uninterrupted video clip recorded by a single camera. The term “single shot” (or “short shot”) refers to an uninterrupted video clip with relatively short duration.